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Abstract.  As  available  intelligence  data  and  information  expand  in  both  quan¬ 
tity  and  variety,  new  techniques  must  be  deployed  for  search  and  analytics.  One 
technique  involves  the  semantic  enhancement  of  data  through  the  creation  of  what 
are  called  ontologies  or  controlled  vocabularies.  When  multiple  different  bodies  of 
heterogeneous  data  are  tagged  by  means  of  terms  from  common  ontologies,  then 
these  data  become  linked  together  in  ways  that  allow  more  effective  retrieval  and 
integration.  We  describe  a  simple  case  study  to  show  how  these  benefits  are  being 
achieved,  and  we  describe  our  strategy  for  developing  a  suite  of  ontologies  to  serve 
the  needs  of  the  war-fighter  in  the  ever  more  complex  battlespace  environments  of 
the  future. 


New  Demands  for  Intelligence  Analysts 
Intelligence  analysts  are  trained  to  use  their  knowledge  of 
available  sources  to  enable  querying  across  huge  quantities 
of  rapidly  changing  data.  Already  the  richness  and  diversity 
of  these  sources  makes  it  very  difficult  for  human  analysts, 
even  with  the  most  powerful  software  tools,  to  leverage  their 
knowledge  for  analytic  purposes.  But  their  problems  will  only  get 
worse.  For  while  conventional  intelligence  processes  have  been 
focused  primarily  upon  enemy  units  and  on  the  effects  of  terrain 
and  weather  on  military  operations,  new  strategic  guidance  will 
require  the  intelligence  community  to  focus  also  on  disciplines 
such  as  cyberwarfare  and  civil  information  management  [1, 2], 
and  this  will  imply  a  massive  expansion  of  the  types  of  informa¬ 
tion  relevant  to  analysis.  The  complex  operations  in  which  the 
warfighter  of  the  future  will  be  involved  will  require  not  only  the 
mastery  of  vast  quantities  of  network  data  but  also  informa¬ 


tion  pertaining  to  the  entire  ecology  of  daily  life  in  the  areas  of 
operation  for  asymmetric  warfare,  including  information  regard¬ 
ing  religion,  leadership,  economics,  culture,  disease,  food,  water 
and  other  natural  resources,  and  many  more.  All  of  this  will  go 
hand  in  hand  with  a  vast  expansion  of  the  range  of  opportunities 
for  the  enemy  to  exploit  weaknesses  on  the  side  of  the  war¬ 
fighter-including  weaknesses  in  our  own  understanding  of  this 
expanded  environment  of  civil/military  operations. 

This  increase  in  data  diversity  and  volume,  and  in  the  veloc¬ 
ity  of  change  of  data  sources  will  pose  an  entirely  new  set  of 
challenges  for  intelligence  analysts,  bringing  the  need  for  an 
approach  to  automated  analytics  that  can  solve  the  problem  of 
rapid  integration  of  heterogeneous  and  rapidly  changing  data 
in  a  way  that  can  be  reapplied  in  agile  fashion  to  each  new 
domain.  This  problem  is  analogous  in  some  respects  to  the 
problem  faced  by  warfighters  of  previous  generations,  who  were 
attempting  to  develop  the  capability  for  massing  timely  and  ac¬ 
curate  artillery  fires  by  dispersed  batteries  upon  single  targets. 
For  massed  fires  to  be  possible  dispersed  artillery  batteries 
needed  the  capacity  for  communication  in  real  time  of  a  sort 
that  would  create  and  sustain  a  common  operational  picture  that 
could  be  constantly  updated  in  light  of  new  developments  in  the 
field.  A  way  needed  to  be  found,  in  other  words,  to  transform 
dispersed  batteries  into  a  single  system  of  what  we  might  today 
call  interoperable  modules.  The  means  to  achieve  this  capability 
through  a  new  type  of  governance  and  training,  and  through  the 
creation  of  new  doctrine  in  the  field  of  artillery,  were  forged  only 
in  the  early  years  of  the  last  century  at  Ft.  Sill,  Oklahoma  [3]. 

Today,  we  are  facing  the  problem  of  massing  intelligence 
fires— of  bringing  all  relevant  intelligence  capabilities  to  bear 
on  a  target  of  interest  in  such  a  way  that  they,  too,  can  serve 
as  interoperable  modules  contributing  to  the  development  of 
a  single  shared  evolving  operational  picture.  In  what  follows 
we  describe  a  strategy  that  is  designed  to  address  just  one 
part  of  this  problem— a  strategy  that  is  already  being  applied 
in  the  field  to  aid  intelligence  analysts  working  with  a  very 
large  dynamic  (cloud-based)  data  store  to  support  operational 
decision-making  [4].  The  approach  is  of  interest  not  least 
because  it  can  be  applied  not  merely  to  enhance  existing  data 
sources  but  also  to  build  new  representations  in  situ  to  serve 
analysts  in  the  field. 

Military  Ontology 

An  ontology,  in  brief,  is  a  set  of  terms  and  definitions  rep¬ 
resenting  the  kinds  and  structures  of  entities  and  relations  in 
some  given  area  of  reality.  An  ontology  is  thus  comparable  to  a 
computerized  dictionary.  But  it  differs  from  a  dictionary  in  being 
built  around  a  logically  robust  classification  of  the  entities  in  its 
domain,  of  a  sort  that  can  be  used  to  enhance  computer-based 
retrieval  and  integration  of  salient  data. 

The  methods  used  today  in  ontology  building  include  getting 
clear  about  what  the  types  of  entities  are  in  a  shared  domain 
of  interest,  and  also  getting  clear  about  the  sorts  of  relations 
between  these  entities,  methods  which  have  been  used  by  com¬ 
manders  and  war-planners  since  the  dawn  of  organized  warfare 
in  order  to  represent  the  tactical,  operational,  and  strategic-level 
realities  that  make  up  the  battlespace  (see  Figure  1). 
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The  Strategy  of  Semantic  Enhancement  (SE) 

In  the  data  sources  available  to  the  analyst,  multiple  different 
terms,  formats  and  data  models  are  used  to  describe  the  data 
The  strategy  of  SE  [6]  is  a  response  to  the  problems  created 
by  this  diversity  resting  on  the  use  of  simple  ontologies  whose 
terms  are  used  to  tag  (or  ‘annotate’)  source  data  artifacts  in  a 
consistent  way.  Ontologies  built  for  SE  purposes  provide  a  re¬ 
stricted  vocabulary  that  will  enable  analytics  tools  to  see  through 
the  inconsistencies  and  redundancies  in  the  data  This  means: 
providing  one  term  (‘preferred  label’),  and  one  definition,  for 
each  salient  type  in  each  domain  [7]. 

As  illustrated  in  Table  1 ,  the  terms  in  an  SE  ontology  are  con¬ 
nected  together  in  a  simple  hierarchy  by  means  of  the  “is_a”  (or 
subtype)  relation.  Each  term  appears  only  once  in  this  hierarchy, 
and  is  associated  in  a  stable  way  with  its  parent  and  child  terms  in 
the  hierarchy  even  when  new  terms  or  even  whole  new  branches 
are  added  to  the  ontology  in  the  course  of  time.  This  stability  is 
important,  since  the  success  of  the  strategy  requires  ontologies 
that  can  be  repeatedly  reused  to  annotate  many  different  kinds 
of  data  in  ways  that  then  serve  multiple  different  analyst  com¬ 
munities  and  thereby  contribute  to  the  creation  of  an  ever  more 
comprehensive  common  operational  picture.  SE  is  thus  designed 
to  be  at  the  same  time  more  stable  and  more  flexible  than  the 
traditional  harmonization  and  integration  approaches  that,  be¬ 
cause  they  are  typically  based  on  ad  hoc  mappings  amongst  data 
models,  often  rapidly  degrade  in  their  effectiveness  over  time. 

On  the  other  hand,  however,  ontology  is  no  panacea.  In¬ 
deed,  the  increasing  popularity  of  ontologies  in  the  wake  of 
the  Semantic  Web  [8]  has  meant  that  ontologies,  too,  are  now 
frequently  being  created  in  ad  hoc  fashion  to  address  specific 
local  data  integration  needs  with  little  or  no  attention  to  the 
issues  of  consistency  and  stability.  For  SE  to  work,  however,  it 
is  important  that  we  find  a  way,  through  governance,  training 
and  doctrine,  to  counteract  this  tendency  to  ad  hoc  ontology 
development  by  bringing  it  about  that  a  single  evolving  suite  of 
consistent  ontologies  is  created  through  the  coordinated  effort 
of  multiple  communities.  Already  the  return  on  investment  from 
the  initial  phase  of  the  work  described  here  has  shown  that 
such  coordinated  effort  can  bring  significant  benefits  by  making 
visible  connections  between  data  that  had  hitherto  been  walled 
off  in  separate  siloes. 

The  Architectural  Approach 

To  this  end,  the  SE  ontologies  are  organized  on  three  levels, 
with  successively  greater  degrees  of  flexibility: 

•  A  single,  small,  domain-neutral  Upper-level  Ontology  (ULO),  for 
which  our  selected  candidate  is  the  Basic  Formal  Ontology  [9]. 

•  Mid-level  Ontologies  (MLOs),  formed  by  grouping  together 
terms  relating  to  specific  domains  of  warfare,  or  to  specific  tasks 
such  as  inter-agency  information  sharing  [1 0]. 

•  Low-level  Ontologies  (LLOs)  focusing  on  specific  domains,  for 
example:  EyeColor,  HairColor,  Name. 

The  terms  used  in  these  ontologies  represent  what  is  general 
or  repeatable  in  reality  at  successively  more  specific  levels.  The 
level  of  an  ontology  is  determined  by  the  degree  of  generality  of 
the  types  in  reality  which  its  nodes  represent. 


Figure  1 :  “Rakkasan”  Commander  Col.  Luong  issues  an  opening  state¬ 
ment  at  the  start  of  a  sand  table  briefing.  The  pieces  on  the  sand  table  are 
the  result  of  an  ontological  process  of  categorization  of  the  entities  in  the 
relevant  domain  [5]. 


[  =  is  a  (or  subtype) 

vehicle  =def:  an  object  used  for  transporting 
people  or  goods 

[  tractor  =def:  a  vehicle  that  is  used 
for  towing 

[  artillery  tractor  =def:  a  tractor  that 
is  used  to  tow  artillery  pieces 

[  wheeled  artillery  tractor  =def :  an 

artillery  tractor  that  runs  on  wheels 

[  tracked  artillery  tractor  =def:  an 
tractor  that  runs  on  caterpillar 

artillery 

track 

Table  1 :  Examples  of  definitions  used  in  SE  ontologies. 

The  ULO  is  maximally  general;  it  provides  a  high-level 
categorization  relating  to  distinctions  such  as  that  between  an 
object  and  a  process,  or  between  an  object  and  its  qualities  (for 
example  temperature),  roles  (for  example,  commander),  and 
spatial  locations. 

The  MLOs  are  general  representations  formulated  using 
terms  (such  as  database,  person,  organization)  which  will  be 
needed  by  specific  communities  of  SE  users  and  developers. 

At  the  bottom  of  the  hierarchy  are  the  LLOs,  each  represent¬ 
ing  some  narrow  homogeneous  portion  of  reality.  In  the  SE 
approach,  the  LLOs  represent  reality  in  such  a  way  that: 

1 .  For  each  salient  domain,  exactly  one  LLO  is  constructed 
that  is  in  conformity  with  the  settled  science  or  military  doctrine 
in  that  domain. 

2.  The  LLOs  are  orthogonal  (they  do  not  share  any  terms 
in  common). 

3.  They  are  designed  to  reduce  the  need  for  (typically  fragile, 
and  costly)  mappings  between  ontologies  covering  the  same  or 
overlapping  domains. 

4.  They  are  able  to  be  used  as  reliable  starting  points  for  the 
development  of  cross-domain  ontologies  needed  for  all  of  intel¬ 
ligence  and  for  specific  areas  of  intelligence  analysis. 
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?  •’Physical  property’ 

▼  •’Human  anatomical  property1 
'Anatomical  scar4 
i  •'Birth  mark’ 

•  'Bodily  piercing' 

▼  •EyeColor 

r  •AmberEyeColor 
L  •BlueEyeColor 
|  •BrownEyeColor 

•  GrayEyeColor 

•  GreenEyeColor 
1  •HazelEyeColor 

?  •'Hair  color' 

•  AuburnHairColor 
|  •  BlackHairColor 

BlondHairColor 

•  BrownHairColor 

\  •ChestnutHairColor 
GrayHairColor 
I •RedHairColor 
1  •WhiteHairColor 
•’Hairstyle’ 

•’Human  anatomical  build' 
•’Human  anatomical  height' 
•'Human  anatomical  weight 

►  •’Human  facial  feature' 

►  •  'Physical  condition' 

►  •'Skin  color* 


Figure  2:  Human  Anatomical  Property  Ontology 


An  example  SE  LLO  is  illustrated  in  Figure  2.  Other  examples  are: 

•  PersonName  (with  types:  FirstName,  LastName, 
Nickname,  ...) 

•  Personldentification  (with  types:  SocialSecurityNumber, 
DriverLicenseNumber,  ...) 

•  PersonDate  (with  types:  BirthDate,  DeathDate,  ...) 

•  InformationProvenance  (with  types:  Origin,  Credibility, 
Confidence,  ...) 

•  Evidence  (with  types:  ConfirmingEvidence, 
ContravertingEvidence,  ...) 

The  SE  approach  is  designed  to  be  of  maximal  utility  to 
intelligence  analyst  users  of  data.  Ontology  content  is  created 
only  in  response  to  identified  situational  needs  of  analysts,  and 
architectural  requirements  are  designed  to  ensure  coherent 
evolution  of  the  SE  resource  without  sacrificing  the  flexibility 
and  expressivity  needed  in  actual  deployment  in  the  field.  As 
more  experience  is  gained  using  SE  ontologies,  intelligence 
analysts  will  uncover  new  ways  to  exploit  the  SE  resource,  and 


new  groups  of  users  will  begin  to  see  the  benefits  to  be  gained 
from  developing  their  own  complementary  ontology  resources  in 
a  way  that  is  compliant  with  the  SE  architecture.  Their  data  will 
then  progressively  become  integrated  with  existing  SE  resourc¬ 
es,  bringing  benefits  through  increase  in  the  amount,  variety  and 
quality  of  data  upon  which  intelligence  analysts  can  draw  [1 1]. 

In  this  way— following  a  pattern  that  has  been  realized  already  in 
biology  and  other  domains  [1 2]— the  SE  strategy  will  engender 
collaborative  ontology  development  and  re-use  over  multiple 
data  collection  endeavors,  both  internal  and  external. 

The  Discipline  of  Intelligence  Analysis 

Joint  doctrine  [13]  defines  multiple  hierarchically  organized 
disciplines,  for  example,  intelligence,  information  operations, 
cyberspace  operations;  the  discipline  of  Intelligence  in  its  turn 
has  doctrinally  defined  sub-disciplines  such  as  Human  Intel¬ 
ligence  (HUMINT),  Signals  Intelligence  (SIGINT),  and  imagery 
intelligence  [1 4]. 

On  the  typical  approach  to  intelligence  analysis,  each  new  set 
of  analytical  problems  rests  on  its  own  collection  of  data  sources, 
which  must  be  identified  and  integrated  in  ad  hoc  fashion  through 
manual  effort  by  the  analyst.  A  typical  analyst  may  be  working 
with  some  1 00s  of  data  sources,  with  each  source  coming  from 
a  particular  discipline  such  as  HUMINT  or  Geospatial  Intelligence 
(GEOINT).  For  an  analyst  to  come  to  a  conclusion  or  decision, 
he  has  to  verify  each  particular  piece  of  information  in  3  distinct 
disciplines.  For  example,  if  a  GEOINT  source  says  that  location  X 
is  ‘bad’,  then  there  has  to  be  something  in,  say,  a  HUMINT  and  a 
SIGINT  source  that  confirms  this  statement. 

Already  here  we  see  the  vital  need  for  integration  of  hetero¬ 
geneous  data  for  purposes  of  intelligence  analysis.  The  SE  ap¬ 
proach  has  evolved  in  response  to  the  general  recognition  that 
traditional  approaches  to  such  integration,  both  physical  and 
virtual,  are  increasingly  failing  in  the  face  of  the  scale,  diversity, 
and  heterogeneity  of  many  data  sources  and  data  models. 

Such  traditional  approaches  fail  where  they  do  not  address  the 
following  requirements: 

•  Integration  must  occur  without  the  need  for  heavy 
pre-processing  of  the  data  artifacts  which  need  to 
be  integrated. 

•  Integration  must  occur  without  loss  or  distortion  of  data. 

•  The  integration  approach  must  be  able  to  evolve  to 
accommodate  highly  heterogeneous  and  rapidly 
evolving  data. 

Already  the  tagging  of  intelligence  data  in  consistent  fashion 
by  drawing  on  a  simple  ontology  for  describing  the  different 
kinds  of  sources  brings  benefits  to  the  analyst  in  a  way  that 
meets  all  of  these  requirements. 

Case  Study  Illustrating  the  Benefits  Brought  by  SE 
to  Intelligence  Analysis 

In  what  follows  we  illustrate  how  these  benefits  are  realized 
in  terms  of  a  simple  case  study  in  which  the  SE  approach  is 
applied  to  a  set  of  cloud-based  data  sources,  including  text,  im¬ 
ages,  audio,  and  signals,  as  described  in  [3].  These  data  sources 
are  stored  together  with  structured  descriptions  of  their  associ- 
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Multiple  Data  models 


Person 


PersonName  |  NetworkSkill  |  Programmings  kill 


Persons  kill 


|  Last  Name 

First  Name  Skill 

Skill 

Person  Name 

Computer  Skill 

Single  Ontology 


Figure  3:  Samples  of  data  models,  in  which  arbitrary  combinations  are  allowed  (LEFT),  vs.  SE  ontologies,  with  their  constrained  hierarchies  (RIG FIT) 


ated  data  models.  The  problem  that  SE  is  designed  to  solve 
arises  because  different  data  models  can  present  data  about 
the  same  entities  in  arbitrarily  many  different  ways,  as  illustrated 
on  the  left  of  Figure  3.  The  SE  ontology  content  illustrated  on 
the  right,  in  contrast,  employs  simple  terms  in  a  stable  fashion  to 
ensure  that  entities  of  the  same  types  are  represented  always  in 
the  same  way. 

SE  terms  are  associated  with  the  labels  used  in  the  native 
data  model  descriptions,  as  in  Tables  2  and  4.  To  enable  benefits 
from  this  association  in  the  form  of  efficient  search,  the  entire 
aggregated  content  of  our  data  sources,  both  structured  and 
unstructured,  is  indexed,  using  a  Lucene  index  [15]  distributed 
over  Solr  [1 6].  This  Index,  which  is  continuously  being  re-created 
to  ensure  synchronization  with  newly  posted  data,  is  a  result  of 
pre-materialization;  that  is,  it  reflects  pre-calculations  of  the  an¬ 
swers  to  sets  of  the  most  common  queries  posted  by  analysts. 


We  consider  a  simplified  example  using  three  native  data 
sources,  Db  1  -3,  which  we  illustrate  in  each  case  by  column 
labels  and  a  single  row  of  sample  data.  To  see  the  sorts  of 
problems  we  face  compare  how,  in  Db  1 ,  ‘Java’  is  used  elliptically 
to  mean  ‘Java  programming  skill’,  while  ‘Name’  is  used  to  mean 
‘Name  of  skill’. 


*Source  database  Dbl,  with  tables  Dbl. Person  and  Dbl. 
Skill,  containing  person  data  and  data  pertaining  to  skills  of 
different  kinds,  respectively. 


PersonID 

SkilllD 

1 1 1 

222 

SkilllD 

Name 

Description 

222 

Java 

Programing 

JutfAogust  2012 
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*  Source  database  Db2. Person,  containing  data  about  IT 
personnel  and  their  skills: 

ID  SkillDescr 

333  SQL 


*  Source  database  Db3.ProgrSkill,  containing  data  about 
programmers’  skills: 

EmpllD  SkillName 

444  Java 


Second,  we  use  SE  ontologies  as  illustrated  in  Figure  3  to 
annotate  the  data  from  these  databases.  Sample  results  of  this 
annotation  are  illustrated  in  Tables  2-4,  which  are  representative 
of  the  kinds  of  tables  contained  in  our  aggregated  store. 

Table  2  contains  sample  labels  used  in  annotations.  The  rows 
of  Table  3  represent  sample  annotations  using  SE  ontology  terms. 
The  rows  of  Table  4  consist  of  sample  statements  of  the  sorts 
used  both  in  storing  native  data  and  in  generating  the  Index. 


Label 

Source 

PersonID 

Dbl  .Person 

SkilllD 

Dbl  .Skill 

Name 

Dbl  .Skill 

Description 

Dbl  .Skill 

ID 

Db2. Person 

SkillDescr 

Db2. Person 

EmpllD 

Db3.ProqrSkill 

SkillName 

Db3.ProgrSkill 

Table  2.  Sample  labels  from  source  data  models  and  from  the  SE  ontologies 


Source  Artifact  Label 

SE  Label 

Dbl  .Name 

SE. Skill 

Db2. SkillDescr 

SE. ComputerSkill 

Db3. SkillName 

SE.ProgrammingSkill 

Dbl  .PersonID 

SE. PersonID 

Db2.ID 

SE. PersonID 

Db3. EmpllD 

SE. PersonID 

Table  3.  Sample  annotations  of  labels  in  source  artifacts 


The  process  of  annotation  proceeds  manually  as  follows.  The 
annotator  is  required  to  apply  to  each  label  in  the  target  data 
model  the  term  at  the  lowest  level  in  the  SE  hierarchy  whose 
application  is  still  warranted  (1)  by  the  meaning  of  the  label 
and  (2)  by  information  the  annotator  has  about  the  database  in 
question,  including  (3)  information  concerning  the  data  values 
labeled.  For  example,  Dbl  contains  data  about  skills  in  many  ar¬ 
eas;  its  label  Skill  must  therefore  be  annotated  with  the  general 
term  Skill  and  not  with  any  more  specific  term.  Db2  is  known  to 
contain  only  data  about  skills  in  the  area  of  IT;  this  warrants  the 
use  of  ComputerSkill  in  annotating  its  label  SkillDescr. 

The  Index  contains  entries  of  various  sorts,  as  represented  in 
Table  4.  Which  sorts  of  entities  we  index  is  determined  by  the 
ontologies  for  Person,  Place,  and  so  on.  The  subservient  LLOs, 
which  provide  the  SE  labels  to  be  used  in  annotations  for  dif¬ 
ferent  sorts  of  data,  are  used  in  formulating  the  field  value  pairs 
associated  with  Index  entries. 

Currently,  the  SE  Index  incorporates  the  results  of  inferences 
over  an  initial  tranche  of  semantically  enhanced  content.  In  Table 
5  we  see  how  the  Index  looks  when  it  is  able  to  incorporate  the 
results  of  integration  over  the  SE  annotations.  These  inferences 
rest  on  the  logical  structure  of  the  SE  ontologies  and  of  their 
constituent  definitions.  For  example,  the  term  Programmer  is 
defined  as  Person  with  programming  skill  and  the  Skill  LLO 
incorporates  an  inferred  subclassification  of  persons,  which  is 
represented  in  the  Index  using  the  Subtype  field  (see  the  entry 
for  PersonlD=444  in  Table  5). 

When  creating  the  Index,  the  indexing  process  crawls  state¬ 
ments  of  the  sorts  shown  in  Table  4  and  uses  SE  labels  for 
the  Index  fields  wherever  these  are  available.  Thus,  as  Table  5 
illustrates,  we  obtain  fields  carrying  terms  from  the  LLO  Skill  and 
LLO  Personldentification,  as  follows: 


Index  Entry 

Associated  Field-Value 

111,  PersonID 

Type:  Person 

Skill:  Java 

Dbl  .Description:Programming 

333,  PersonID 

Type:  Person 

ComputerSkill:  SQL 

444,  PersonID 

Type:  Person 

SubType:  Programmer 

ProgrammingSkill:  Java 

Table  5.  Sample  Entries  of  the  Dataspace  Index  based  on  the  SE 

Some  native  content  is  not  (or  not  yet)  covered  by  the  SE  (the 
Description  label  from  Dbl. Skill  in  our  example),  reflecting  the 
incremental  nature  of  the  SE  process.  Indexing  in  such  cases  is 
effected  using  native  labels.  In  this  way,  incomplete  SE  coverage 
of  native  models  does  not  entail  unavailability  of  the  correspond¬ 
ing  data  to  analysts’  searches. 


To  begin  to  see  the  benefits  of  SE  for  data  integration,  note 
how  three  distinct  items  in  the  first  column  of  Table  3— PersonID 
from  Dbl,  ID  from  Db2,  and  EmpllD  from  Db3— are  all  anno¬ 
tated  with  the  same  SE  expression,  namely  PersonID  from  the 
Personldentification  LLO. 


Data  Value  and 

Relation 

Data  Value  and 

Associated  Label 

Associated  Label 

111,  Dbl  .PersonID 

Dbl  .hasSkilllD 

222,  Dbl. SkilllD 

222,  Dbl. SkilllD 

Dbl.hasName 

Java,  Dbl. Name 

222,  Dbl. SkilllD 

Dbl.hasDescription 

Programming,  Dbl  .Description 

333,  Db2.ID 

Db2.hasSkillDescr 

SQL,  Db2. SkillDescr 

444,  Db3. EmpllD 

Db3.hasSkillName 

Java,  Db3. SkillName 

Table  4.  Statements  illustrating  the  sorts  of  source  data  used 
in  compiling  the  Index 
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A  Sample  Query  Illustrating  the  Advantages 
Brought  by  SE 

Suppose  the  analyst  needs  to  use  the  Index  in  order  to  find, 
for  example,  all  instances  of  the  type  Person  referenced  in  the 
Dataspace  as  having  some  predefined  set  of  skills.  When  ad¬ 
dressed  to  the  sample  entries  in  Table  5,  this  will  yield  results  as 
in  Table  6. 

To  see  the  advantages  that  have  been  brought  to  the  human 
analyst  by  the  SE  process,  contrast  now  Table  7,  which  shows  In¬ 
dex  entries  corresponding  to  those  of  Table  5  as  they  would  have 
been  generated  prior  to  SE.  Table  7  reveals  two  sorts  of  obstacles 
faced  by  the  analyst  using  pre-SE  data.  First:  because  person  IDs 
and  names  of  skills  in  the  native  sources  are  listed  under  many 
different  headings,  querying  these  sources  without  SE,  even  for 
simple  person  ID  or  skill  information,  requires  knowledge  on  the 
part  of  the  analyst  of  the  idiosyncrasies  of  each  data  source. 
Second:  because  data  models  are  flat,  in  the  sense  that  they  do 
not  define  hierarchical  relations  between  more  general  and  more 
specific  types,  querying  across  sources  that  contain  data  at  differ¬ 
ent  levels  of  detail  is  virtually  impossible. 

Indeed,  however  much  manual  effort  the  analyst  is  able  to  apply 
in  performing  search  supported  by  the  Index  entries  illustrated  in 
Table  7,  the  information  he  will  gain  will  still  be  meager  in  compari¬ 
son  with  what  is  made  available  through  Table  5.  Even  if  an  analyst 
is  familiar  with  the  labels  used  in  Db  1 ,  for  example,  and  is  thus  in  a 
position  to  enter  Name  =  Java,  his  query  will  still  return  only:  person 
111.  Directly  salient  Db4  information  will  thus  be  missed. 

Conclusion 

Analysts  are  of  course  trained  to  be  aware  of  the  types  of 
information  that  are  available  in  different  sources.  But  in  today’s 
dynamic  environment,  in  which  ever  more  domains  and  ever 
more  associated  data  sources  become  salient  to  intelligence 
analysis,  it  is  practically  impossible  for  any  analyst  to  know  the 
content  of  all  sources.  The  likelihood  that  important  data  will  be 
missed  remains  very  high,  and  the  need  for  agile  support  for  re¬ 
trieval  and  integration  of  the  sort  provided  through  the  strategy 
of  semantic  enhancement  becomes  all  the  more  urgent.  This 
strategy  was  designed,  in  effect,  to  remedy  some  of  the  conse¬ 
quences  of  the  inevitable  lack  of  coordination  in  the  develop¬ 
ment  of  information  resources  in  the  intelligence  domain,  and 
thereby  to  support  massed  informatics  fires  against  ever-new 
types  of  intelligence  targets. 
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•  entering  Skill  =  Java  (which  will  be  re-written  at  run  time  as:  Skill  = 
Java  OR  ComputerSkill  =  Java  OR  ProgrammingSkill  =  Java  OR 
NetworkSkill  =  Java)  will  return:  persons  1 1 1  and  444 

•  entering  ComputerSkill  =  Java  OR  ComputerSkill  =  SQL  will  return: 
persons  333  and  444 

•  entering  ProgrammingSkill  =  Java  will  return:  person  444 

•  entering  Description  =  Programming  will  return:  person  1 1 1 

•  entering  SubType  =  Programmer  will  return:  person  444 

Table  6:  Sample  queries  over  the  Dataspace  Index  and  their  results  with  SE 


Index  Entry 

Associated  Field-Value 

111,  PersonID 

Type:  Person 

Name:  Java 

Description:  Programming 

333,  ID 

Type:  Person 

SkillDescr:  SQL 

444,  EmpllD 

Type:  Person 

SkillName:  Java 

Table  7.  Sample  Entries  of  the  Dataspace  Index  prior  to  SE 
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